UMBC High Performance Computing Facility : HPC Quickstart
This page last changed on Mar 11, 2009 by straha1.
Table of ContentsHardwareHPC is a 35-node opteron cluster (33 compute nodes, one dedicated debugging node that accepts jobs and one login node), several file servers, all connected by an Infiniband interconnect. There is 14 TB of high-speed Infiniband-connected storage which has some built-in redundancies but is not backed up. There is 200 GB of backed-up shared storage for home directories. Each node has:
This cluster does not provide long term storage, with the exception of your home directory (which has a very low quota). Additional cluster-accessible long-term storage is available separately from OIT for a yearly fee. See this page more detailed information. Your Account: Getting it and Using itTo get an account, submit a request using the account request form. To log in, simply use ssh: ssh username@hpc.rs.umbc.edu and use your UMBC-wide username and password (same as for MyUMBC). Transfer data to HPC using scp: scp some_file username@hpc.rs.umbc.edu: You can use sftp or sshfs instead – anything that supports version 2 of the ssh protocol will work. By default, users have a quota of 100,000 KB and 10,000 files. Anything large should be stored in your ~/scratch/ or ~/common/ directories which are merely symbolic links to the network-mounted, infiniband-accessed 14 TB storage areas on Sun "Thumpers". That data is not backed up, but it does have some redundancy to make it tolerant of limited amounts of hardware failures. You can rent additional, thoroughly backed up storage space from OIT. See the Initial Setup of Your Account page for more details. SoftwareThe following software is available on HPC:
MPI ImplementationsWe have the following MPI options:
All three options work with C, C++, Fortran 77 and Fortran 90 with each of GCC and PGI. See this page for advice on choosing between them. You switch which compiler+MPI combination you're using by running the switcher command and then logging off and logging back in. If you don't log off and log back in, strange, unwanted things will happen. Switcher commands include:
Compiling MPI programs:
If you are using Fortran, replace ld with mpif90 or mpif77. If you are using C++, replace ld with mpicxx. C programs can use any of the four executables in place of ld. Mixing C++ and Fortran can be complicated – avoid it if you can. Running MPI programs varies between the three implementations: OpenMPI, MVAPICH, MVAPICH2. Preliminary benchmarks have shown that MVAPICH2 is the fastest. OpenMPI is the slowest by a pretty wide margin. MVAPICH2 is more complicated to use – see this page for more details. PBS Queuing SystemHPC uses the PBS queuing system. Jobs are submitted using qsub, monitored with qstat and canceled with qdel. Currently, interactive jobs are not allowed. We have three queues: testing, low_priority and high_priority. The testing queue uses the dedicated debug node. Please use the testing queue to test any new programs before running them on the cluster nodes. Details about the differences between the queues are available on this page: Queues on HPC. A simple example qsub script: #!/bin/bash : put no lines beginning with # before the #PBS lines other than the /bin/bash line #PBS -N 'hello_parallel' #PBS -o 'log.file.for.stdout' #PBS -e 'log.file.for.stderr' #PBS -W umask=007 #PBS -q low_priority #PBS -l nodes=5:ppn=4 #PBS -m bea cd $PBS_O_WORKDIR : The exact format of the call to mpirun varies between implementations: mpirun --machinefile $PBS_NODEFILE --np 20 ./my_program_name Look here for an explanation of those options (or run man qsub). There is one important fact you must remember: nodes=5 does not mean five machines. It means five groups of ppn processor cores, where all processor cores in a group are on the same machine. Thus if you type nodes=5:ppn=4 you'll get five machines all to yourself (since all of our cluster nodes have four processor cores). If you omit the ppn=4 then you'll get five processor cores somewhere on the cluster. Submit that script using: qsub name-of-your-script That will give you a job number (such as 4006.hpc.cl.rs.umbc.edu) which you can then use with qstat and qdel:
Look here for more details. Note that if you forget your job number, you can run qstat and it will list all jobs (including yours) with the job numbers in the first column. |
Document generated by Confluence on Mar 31, 2011 15:37 |